Investigation of acute encephalitis syndrome with implementation of metagenomic next generation sequencing in Nepal

Background The causative agents of Acute Encephalitis Syndrome remain unknown in 68–75% of the cases. In Nepal, the cases are tested only for Japanese encephalitis, which constitutes only about 15% of the cases. However, there could be several organisms, including vaccine-preventable etiologies that cause acute encephalitis, when identified could direct public health efforts for prevention, including addressing gaps in vaccine coverage. Objectives This study employs metagenomic next-generation-sequencing in the investigation of underlying causative etiologies contributing to acute encephalitis syndrome in Nepal. Methods In this study, we investigated 90, Japanese-encephalitis-negative, banked cerebrospinal fluid samples that were collected as part of a national surveillance network in 2016 and 2017. Randomization was done to include three age groups (< 5-years; 5-14-years; >15-years). Only some metadata (age and gender) were available. The investigation was performed in two batches which included total nucleic-acid extraction, followed by individual library preparation (DNA and RNA) and sequencing on Illumina iSeq100. The genomic data were interpreted using Chan Zuckerberg-ID and confirmed with polymerase-chain-reaction. Results Human-alphaherpes-virus 2 and Enterovirus-B were seen in two samples. These hits were confirmed by qPCR and semi-nested PCR respectively. Most of the other samples were marred by low abundance of pathogen, possible freeze-thaw cycles, lack of process controls and associated clinical metadata. Conclusion From this study, two documented causative agents were revealed through metagenomic next-generation-sequencing. Insufficiency of clinical metadata, process controls, low pathogen abundance and absence of standard procedures to collect and store samples in nucleic-acid protectants could have impeded the study and incorporated ambiguity while correlating the identified hits to infection. Therefore, there is need of standardized procedures for sample collection, inclusion of process controls and clinical metadata. Despite challenging conditions, this study highlights the usefulness of mNGS to investigate diseases with unknown etiologies and guide development of adequate clinical-management-algorithms and outbreak investigations in Nepal.


Background
Acute Encephalitis Syndrome (AES) is defined by acute onset of fever and a change in mental status (including symptoms such as confusion, disorientation, coma, or inability to talk) and/or new onset of seizures (excluding simple febrile seizures) in a person of any age at any time of year [1].This term was coined by World Health Organization (WHO) in 2008 [1].Globally, based on various studies, the incidence of AES has ranged from 3.5 to 7.4 per 100,000 patients-years, with a higher incidence among children [2].
The patients suffering from AES usually present acute onset of fever and altered sensorium.This is followed by rapidly worsening clinical conditions and death [3].The survivors can suffer from long term health issues, including neurological sequelae [3,4].The etiologies causing AES can be infectious and non-infectious, with the infectious category comprising of a broad range of organisms (bacteria, virus, parasites) [2,5].The causative agents of AES also vary with season and geographic location [6].Research has shown that the etiologies of AES remain unknown in 68-75% of the cases, while Japanese encephalitis (JE) constitutes about 15% of the cases [7][8][9][10].The landscape of AES, in terms of etiology, has changed in India as well, where outbreak investigations and surveillance studies have increasingly reported non-JEV etiologies [11].
In Nepal, JE is majorly associated with mortality and morbidity among children [12].Therefore, since 2004, the Ministry of Health and Population of Nepal, supported by the Office of Infection Prevention Division, World Health Organisation (WHO), has integrated JE surveillance with Acute Flaccid Paralysis, Neonatal Tetanus, and Measles in its National Surveillance Network [13].Until 2011, over 23,000 AES cases were reported by the surveillance network [14].Due to a lack of knowledge in etiology, AES cases are only tested for JE and clinical management is performed based on this result.The incidence of undiagnosed AES etiology contributes to a high rate of death and morbidity [14].There could be several etiologies, including vaccine preventable etiologies, that cause acute encephalitis, which upon identification could direct public health efforts for prevention, including expanded use of vaccines or addressing gaps in vaccine coverage.Herpes Simplex Virus (HSV), Varicella-Zoster Virus (VSV), Enterovirus, Adenovirus, and Rubella, as well as emerging pathogens such as Nipah, Chandipura and Chikungunya have all been reported as causative viral agents of AES, while Neisseria meningitidis, Streptococcus pneumoniae, Listeria sp, and Brucella have been reported as causative bacterial agents [2,[15][16][17].
While molecular methods such as PCR require prior genetic information on causative agents, genomic methods such as metagenomic Next Generation Sequencing (mNGS) can simultaneously identify minute amounts of infections and co-infections of varying origin in a single investigation and assist in the investigation of transmission of such infections [18,19].With the recent dramatic decrease in sequencing costs, this technology provides access to genomic information in a scale that can be implemented to fill gaps in routine clinical practice and address epidemiological questions.In addition to identification (identifying genotypes, virulence or pathogenesis), NGS provide information epidemiological investigation (comparative genomics, phylogenetic analysis) [20][21][22][23].
This study, employing mNGS to explore the infective etiologies behind AES, complements a growing number of studies that have used a similar approach to investigate encephalitis, including in a Low-and-Middle Income Countries (LMIC) context [16,[24][25][26].The identification of such etiologies is an important step in developing effective prevention and treatment measures which in turn will reduce disability and morbidity.

Sample collection and selection
The investigation included a random selection of 90 retrospective cerebrospinal fluid (CSF) samples that were collected by World Health Organization-Immunization Preventable Diseases (WHO-IPD), throughout Nepal, as a part of the National AES Surveillance Network in collaboration with Family Welfare Division (FWD) in 2016 and 2017.The samples are collected from all over Nepal from individuals suffering from AES.These samples had been tested for JE at National Public Health Laboratory (NPHL) and stored at -80 0 C freezer.For this study, only those samples that tested negative for JE were selected.Randomization was done to include three age groups of < 5 years, 5-14 years, and > 15 years.Only some metadata related to the subjects (age and gender) were known.Each sample was provided with unique study codes to maintain privacy.

Nucleic acid extraction and mNGS
Total Nucleic Acid was extracted from the CSF samples using Zymo Quick-DNA/RNA™ Pathogen MiniPrep (R1042).
The total nucleic acid samples were aliquoted into two sub samples for RNA and DNA library preparation, respectively.The library preparations were done using NEB Library Prep Kit Ultra II RNA for RNA (New England Biolabs, E7770S) library preparation, and NEB Library Prep Kit Ultra II FS DNA for DNA library preparation (New England Biolabs, E7805S).The library preparation for the first 30 samples was done in a single batch (at Chan Zuckerberg Biohub, USA) while the remaining 60 samples were done in three batches of 20 samples each (at Dhulikhel Hospital Kathmandu University Hospital, Nepal).Negative extraction and library preparation controls were included in each batch.The library preparation included 10ng of input nucleic acid, followed by fragmentation, adapter ligation, cleanup (Solid Phase Reversible Immobilization beads), barcoding and amplification of library for 12-16 cycles.With subsequent library preparations, quality control was done using agarose gel electrophoresis and Tapestation 4200 platform from Agilent Technologies and later by qPCR using Kapa Illumina Library Amplification (KK2702) and Quantitation Complete Kit (KK4923).It was made sure that the length of DNA in the libraries was around 350-400 bp and had concentration > 1nM.In RNA library preparation, External Control Controls Consortium, 4,456,740 (ERCC) RNA Spike-in controls were used as internal controls.
The libraries that passed quality control filters were pooled and run on an Illumina iSeq100 sequencer.The sequencing was performed for 2 × 146 bp length using custom unique dual indices of 12 bp length.5% PhiX was added as an internal control for sequencing.The loading concentration of pooled libraries was maintained at 100-120pM.

Data analysis
The analysis was performed on the CZ ID (formally known as IDSeq) platform developed by Chan Zuckerberg Initiative and CZ Biohub.CZID accepts raw sequencing data, perform host and quality filteration, followed by execution of assemblybased alignment pipeline [27].The samples are analysed based on number of reads per million (Number of reads aligning to the taxon in the NCBI NR/NT database, per million reads sequenced), reads (Number of reads aligning to the taxon in the NCBI NT/NR database), contig number (Number of assembled contigs aligning to the taxon in the NCBI NT/NR database), id% and z-score.The samples were also visualized using a rpm heatmap where samples and controls are cross-matched against each other.Respective background models were created, from negative extraction and library preparation control, for RNA and DNA Libraries.[28].The primer sets were designed using NCBI primer blast and Gene Script, then checked with Beacon Designer Free and Snap Gene Viewer.

Human alphaherpes virus confirmation was done by qPCR (KAPA HiFi HotStart Ready Mix) using two primer sets: established primers (FP: 5' T G C A G T T T A C G T A T A A C C A C A T A C A G C 3' and RP: 5' A G C T G C G G G C C T C G T T 3') and self-designed primers (FP: 5' G A C T C A
Similarly, for confirmation of Enterovirus, modified protocol with established primers from Enterovirus Surveillance Guidelines were used to perform semi-nested Polymerase Chain Reaction (snPCR) [29].The protocol followed visualization of the bands, for confirmation, in agarose (1.5%) gel electrophoresis.

Subject metadata
The samples selected for this study were banked, retrospective CSF samples collected in 2016 and 2017 with limited metadata such as age and gender.Out of the 90 subjects, 31 (34.4%) were female and the age distribution has been presented in Table 1.

Sample collection
The samples were collected in glass bottles without any preservative and transported to NPHL where they, first, had been tested for JE and subsequently stored at -80 o C. As these samples were collected in 2016 and 2017 and banked, negative controls were not available during collection and transportation.

Nucleic acid extraction and mNGS
The extracted nucleic acid had a concentration ranging from too low to detect to 222 ng/ul.As the analysis was done in two sets.Each set was processed for DNA and RNA Library Preparation and has been presented accordingly.Out of 90 samples, only two samples showed confirmed hits from Enterovirus B and Human-alphaherpes-virus 2, respectively.However, these two targets were not evaluated in other samples due to unavailability of sample volume.

mNGS of RNA libraries
The results from RNA libraries showed some distinct organisms hit in CZID, and also provided a broad picture of the landscape of taxa across the samples.The following are heatmaps generated from through RNA library preparation.
In Figs. 1 and 2, we can see top hits of organisms in the heat map that shows various organisms which are seen at similar levels in the water controls as well.Nevertheless, Pseudomonas genus is seen in all of the samples including few negative controls.There was similar trend, in both sets, with other organism such as Sphingomonas, Acinetobacter, Escherichia and others.
Interestingly, only AES_S47_RNA showed a hit to Enterovirus B (strain Human coxsackievirus B1).This hit was particular to sample 47 and not seen in any negative controls.The metrics such as rPM of 359,409.1 (provides information of the abundance of a specific microbe within the sample), NT L (depicts the length of aligned sequence in base pair), Z score of 99 (shows the significance of any hit compared to the background), coverage visualization (assess breadth and depths of reads) and id of 85.8% signify that the organism hit is highly similar to the reference organism [30].The figure below shows the abundance of Enterovirus B in Sample 47 (NT rPM > = 10 and NT L > = 50).The coverage breadth of this hit was 98.7% with depth of 700.4x as seen in Fig. 3.
The strain Human coxsackievirus B1, from our study, was different (0.0569) from Enteroviruses B isolated from an outbreak in norther India, close in Nepal [62] (Fig. 5).

mNGS of DNA libraries
In mNGS of DNA libraries, hits were observed for Human alphaherpesvirus 2 [AES_S28_DNA] from the first set.The same sample showed hit for Human-alphaherpes-virus 1, but in a very low abundance, shown in Fig. 4. Additionally, background contaminants (laboratory and hospital) were seen in the water controls in this DNA sequencing result as well.Similar to RNA Libraries, most of the samples showed hits for Sphingomonas spp, Pseudomonas spp, and Acinetobacter spp.Nevertheless, the Fig. 6 shows the result of the hit where there were 2,597.6 rPM for Human-alphaherpes-virus 2. However, due to lower coverage, contig visualization was not available for this hit.

PCR confirmation Confirmation of human alphaherpesvirus 2
Out of the two primer sets used, the established primers fared better providing Ct value of 23.11 for Human alphaherpesvirus 2.

Confirmation of Enterovirus B
After completion of snPCR for Enterovirus, the band was seen between 700 and 800 bp after first amplification and between 300 and 400 bp after final amplification.This confirmed the presence of Enterovirus as per the Enterovirus Surveillance Guidelines [36].

Demography of acute encephalitis syndrome (AES)
Most of the subjects suffering from AES were young male population of median age 20 years.This gender distribution was concurrent to previous studies done, in Nepal, on epidemiology of AES [37,38].It has been observed that AES affects individuals from both gender and all ages, however, most of the studies have been done in younger population, as they pose high risk due to lack of developed antibodies [39][40][41][42].Another study done in Fig. 4 Phylogenetic comparison of AES_S47_RNA against other coxsackievirus B1 genomes, from NCBI.This tree was made using CLUSTAL W maximum likelihood statistical method, Tamura-Nei model with nearest neighbor interchange as the maximum likelihood heuristic method   Nepal also observed the young median age (19 years) for AES, while others observed older population [37,38,43,44].

Metagenomic next generation sequencing
In this study, out of the 90 samples tested, most (n = 88) of them could not be identified as specific hits.This was due to high level of background contaminants resulting in low confidence in calling organism hits within the experimental samples.Nevertheless, two samples showed confirmed hits for Enterovirus B and Humanalphaherpes-virus 2, respectively, which is in contrast to the studies showing that non-JE pathogens constitutes of 68-75% of AES cases [7][8][9][10].Nonetheless, the absence of causative agent in remaining samples could indicate that either the samples did not have intact nucleic to start with or had low pathogen abundance or could have been degraded because the ERCCs were amply sequenced from the RNA libraries [16,45].
As per the result of mNGS, the high Z score (99) for Enterovirus B shows that hit for the organism is present significantly in our sample, when compared to the background.The average length of alignment (as shown by L metrics) is long (L = 7258.7),which confirms for a good local alignment to reference [46].The id% is also higher (85.8%) meaning that the organism is highly similar to the reference organism in the database.Additionally, when the genome coverage is seen in detail, we can see that our sequenced genome depicts good coverage breadth and depth (depth of 700x and breadth of 98.7%), which is the range and uniformity of sequencing coverage for the particular hit [45].The presence of ENVB was also confirmed through snPCR followed by visualization of product size specific for all enteroviruses [36].
The hit for human-alphaherpes-virus 2 was considered significant because it was not present in the control samples at the thresholds used to analyze the sample (high Z score of 100%, L value of 128.9, id% of 99.9%) considered reliable [24,27,47].The low contig value, for this hit, could be because of the organism being present at such a low abundance that the sequencer did not sequence enough reads to generate a contig.The contig value is dependent upon the total number of reads and the size of organism's genome [48].Additionally, the decreased sensitivity of mNGS due to low abundance of pathogen has been studied for CSF [49].Several methods have been reported that can be used to increase the abundance of pathogen sequences or remove the unwanted host sequences [50,51].Nevertheless, as this genus is associated with encephalitis, the sample was taken further for analysis [52,53].During confirmation, the lower Ct value of 23.11 indicates presence of human-alphaherpes-virus 2, a known causative agent, in the sample.

Enterovirus B and Human-alphaherpes-virus 2
Enterovirus B is a known causative agent of encephalitis [16,[54][55][56].Enteroviruses are named by their transmission-route through the intestine [57].Studies have shown that enterovirus can cause various diseases in the nervous system, including aseptic meningitis, acute paralysis, encephalitis, meningo-encephalomyelitis among others, in children [58][59][60].Additionally, strain B1 has been documented to cause encephalomyocarditis (meningoencephalitis and severe myocarditis, often accompanied by heart failure) and showed genomic similarity to the enterovirus B from our study [61].Interestingly, studies in India have linked Enterovirus, among other pathogens, to AES, by various studies [62][63][64].Enterovirus outbreak was first reported from Uttar Pradesh, India in 2006 with seasonal outbreaks with high fatality occurring for several years [62,65,66].Southern Nepal borders with Uttar Pradesh, India and due to open borders with similar climate, it is plausible to find Enterovirus in CSF samples in Nepal.However, the strain of Enterovirus from our study was significantly different compared to genomes from the outbreak [62].Additionally, some studies in Nepal have reported Enterovirus as possible etiology of AES for Nepal [67,68].
Similarly, Human-alphaherpes-virus 2 is known to cause encephalitis in neonates and immunocompromised patients.Herpes simplex encephalitis (HSE) has significant morbidity and mortality, even with early diagnosis and treatment [69,70].HSV is found to be one of the predominant causes of AES in the western world [71][72][73].Among HSE, the vast majority of the encephalitis is caused by HSV-1, with HSV-2 being the etiology in less than 10% of the cases [70].Studies in India and Nepal have reported the presence of HSV-2 as causative agent of encephalitis, with varying range of incidence [69,[74][75][76][77][78].
We believe identification of additional etiologies of AES apart from JEV, which is HSV2 and Enterovirus B from our study, pushes for development for inclusive testing strategies.For instance, qPCR testing could be done for HSV2 and EnvB based on designed primers/probes or commercially available qPCR kits.

Clinical data and process control
However, due to lack of clinical metadata, the presence of Enterovirus B and Human-alphaherpes-virus 2 could not be clinically correlated.Clinical metadata such as onset of fever, date of infection, fatality, WBC counts, treatment regimes, adjoining infection, etc. are vital to correspond with the presence of infections [16,79,80].
Additionally, usual environmental contaminants such as Sphingomonas spp., Pseudomonas spp, or Acinetobacter spp were seen.Sphingomonas species are widely distributed in nature and have been isolated from various land and water habitats, as well as from plant root systems, clinical specimens, and other sources.This is essentially due to their ability to survive in low concentrations of nutrients [81,82].Background contaminants of laboratory and hospital origin were also seen in the water controls.With appropriate use of background or negative controls, a background model can be created and subsequently subtracted from the results [16,24].

Collection procedures
The lack of identification of causative agent in other 88 samples could be because all of samples that were analysed were as old as 2016 and 2017, and could possibly have gone through numerous freeze and thaw cycles.Therefore, the collection of samples in nucleic acid protectant such as Zymo RNA/DNA Shield would have protected the nucleic acid from degradation after sampling [83,84].Additionally, the causative agents could also have left the cerebrospinal fluid prior to collection depending upon the time of collection since the onset of fever, because it is advised to collect CSF within seven days of onset of fever [85].
The possibility of freeze thaw cycles affecting the sample quality and lack of clinical metadata are limiting to the analysis, resulting in ambiguous interpretation of some samples.However, we contend that this aspect should not be corroborated as limitations, because the CSF samples analysed were not collected specifically for mNGS and there could be low abundance of the pathogen itself.Additionally, the sequencing was done in Illumina iSeq100 which has a maximum of approximately 4 million reads per run and can only accommodate a certain number of organisms with adequate coverage breadth and depth [86].Therefore, more deeper sequencing using sequencer with higher reads per run, host depletion and pathogen enrichment methods can be applied for samples with low pathogen abundance [50,51].

Impact of the study
The identification of causative etiologies behind Acute Encephalitis Syndrome (AES) are crucial for developing clinical management algorithms, enhancing surveillance, and formulating treatment and prevention policies.Through this study, we advocate for the utilization of unbiased mNGS to explore the etiologies of under-investigated and undiagnosed febrile illnesses.We do not anticipate the routine use of mNGS as a standard diagnostic tool, but recommended to be a valuable investigational and exploratory instrument for identifying causative etiologies and developing molecular diagnostic methods, such as qPCR.

Conclusion
Identification and investigation of etiologies behind AES is essential for developing clinical management algorithms, improving surveillance with region-specific treatment and prevention policy as well as outbreak investigation.We do not expect the adaptation of mNGS as a regular diagnostic tool but rather an investigational and exploration tool to identify causative etiologies and develop molecular methods (such as qPCR) for diagnosis.Thus, this study advocates for utlisation of unbiased mNGS to investigate etiologies of under-investigated and undiagnosed febrile illness.
From this study, two documented, causative agents were revealed through metagenomic next generation sequencing and subsequently confirmed by PCR.Insufficiency of clinical metadata, process controls, and possibility of freeze thaw cycles affecting the sample quality incorporates ambiguity when correlating identified pathogens to infections.Therefore, there is a dire need of implementing standardized collection and storage procedures, including proper process controls and clinical metadata (WBC Count, primary diagnosis, discharge type, presence of another organism).Additionally, it is recommended that CSF sample should be collected in a protectant and transported in a controlled and sterile environment.

Fig. 2 50 Fig. 1
Fig. 2 Heatmap depicting hits for Enterovirus (unnamed genus taxon 12,059 and 138,949) from sequencing of AES_S47_RNA.The names of the samples are on the top of the heat map.The samples marked by green are water controls during extraction (NEC) and library preparation (NLC).The heatmap was generated using the threshold of NT rPM (nucleotide reads per million) > = 10 and NT L (alignment length in base pairs: length of the aligned sequence) > = 50

Fig. 3
Fig. 3 Coverage visualization of Enterovirus in AES_S47_RNA from CZID, depicting the coverage metrics of the contigs generated for the particular hit along with coverage depth and width.The hit was visualized with the threshold of NT rPM > = 10 and NT L > = 50

Fig. 6
Fig. 6 Result from CZID showing the details of top hits in AES_S28_DNA along with various metrics related to the hit.The hits were visualised with the threshold of NT rPM > = 10 and NT L > = 50

Fig. 5
Fig.5 Phylogenic comparison of AES_S47_RNA against coxsackievirus B genomes isolated from an outbreak in India, based on based on partial 5' noncoding region sequences.This tree was made using CLUSTALW.

Table 1
Age distribution between subjects with AES In this study, the subject AES_28 and AES_47 were 24 years female and 25 years female respectively.The median age of the subjects infected with AES was 20 years (IQR: 4-79 years)